In [61]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')
Out[61]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.

Key Driver Analysis

Driver Analysis is a powerful tool that can help you understand the factors that influence loyalty. Driver Analysis attempts to identify the attributes that are most correlated with loyalty (as measured by NPS), and illustrates areas where you are under (or over) delivering. This information can then be used to prioritize the investment of capital, time, and resources into areas that will yield the highest return in customer loyalty.

A key driver analysis investigates the relationships between potential drivers and customer behavior such as the likelihood of a positive recommendation, overall satisfaction, or propensity to buy a product. This is often using data collected from a questionnaire, which might ask for a customer’s demographics, their level of satisfaction with various aspects of your company’s services (e.g., whether it was value for money, or whether the customer services department was helpful) as well as their likelihood of recommending your company to others.

In [22]:
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil.parser import parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sb
#import plotly.plotly as py
#import plotly
#import plotly.graph_objs as go
from datetime import datetime
from dateutil.parser import parse
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
import statistics

Data

In [24]:
file = pd.read_excel("C:/Users/abhinav.gaharwar/Downloads/key driver analysis/data.xlsx")
file.head(5)
Out[24]:
CSPID AVAYAID BPERNR/VOCID Survey Count Employee CA% NPS Score Resolution Rep Sat Score Talk Time AHT TSR FCR %
0 1609717 4580026 92479523 15 Harden, Andrea 0.849535 6.666667 80.0 86.666667 378539 877.158590 1.096491 0.928412
1 1934713 4580031 92430659 11 Bittner, Alesia 0.749727 9.090909 80.0 90.909091 267837 827.243976 0.000000 0.927052
2 2348807 4580033 92478986 4 King, Amber 0.197649 -75.000000 50.0 50.000000 55404 721.555556 0.000000 0.888889
3 2326086 4580036 92479012 2 Love, Aushaila 0.721818 50.000000 100.0 100.000000 55286 936.590164 0.000000 0.903226
4 1646179 4580082 92479056 5 Lindsey, Cynthia 0.895914 0.000000 100.0 80.000000 139609 821.664804 0.000000 0.883495
In [25]:
# file.columns = ['primary rating' if col.startswith('Primary Question') else col for col in file.columns]
# file['Request Date']=pd.to_datetime(file['Request Date'],format='%Y-%m-%d %H:%M:%S')
# file['ReqDate'] = file['Request Date'].apply(lambda x : x.date())
# file['Responded on']=pd.to_datetime(file['Responded on'],format='%Y-%m-%d %H:%M:%S')
# file['ResDate'] = file['Responded on'].apply(lambda x : x.date())
# file['Customer Type'] = file['primary rating'].apply(lambda x: "Detractor" if x>=0 and x<=6 else ("Passive" if x>=7 and x<=8 else "Promoter"))
# file['Month']=file['Request Date'].apply(lambda x: x.month_name())
# #file.head(5)
In [26]:
# file_byCType = file[['Customer Type','ResDate']].groupby(['ResDate','Customer Type']).size()
# file_byCType = file_byCType.to_frame()
# file_byCType = file_byCType.reset_index()
# file_byCType.columns = ['ResDate', 'Customer Type', 'Count']
# #file_byCType.head()
In [27]:
# piv = file_byCType.pivot(index='ResDate', columns='Customer Type', values='Count').reset_index()
# piv['Total'] = piv['Detractor'] + piv['Passive'] + piv['Promoter']
# piv['NPS Score'] = round(((piv['Promoter'] - piv['Detractor']) / piv['Total'])*100, 2)
# piv['Detractor_%'] = round((piv['Detractor'] / piv['Total']) * 100, 2)
# piv['Detractor_%'] = [str(i)+'%' for i in piv['Detractor_%']]
# piv['Passive_%'] = round((piv['Passive'] / piv['Total']) * 100, 2)
# piv['Passive_%'] = [str(i)+'%' for i in piv['Passive_%']]
# piv['Promoter_%'] = round((piv['Promoter'] / piv['Total']) * 100, 2)
# piv['Promoter_%'] = [str(i)+'%' for i in piv['Promoter_%']]
# #piv.head(10)
In [28]:
# data=file[['Product Range','Staff Friendliness','Trial Room','Billing Experience','Ambience and Environment','primary rating','ResDate']]
# data['Customer Type'] = data['primary rating'].apply(lambda x: "Detractor" if x>=0 and x<=6 else ("Passive" if x>=7 and x<=8 else "Promoter"))
In [29]:
file=file.fillna(0)
In [30]:
categorical_list = []
numerical_list = []
for i in file.columns.tolist():
    if file[i].dtype=='object':
        categorical_list.append(i)
    else:
        numerical_list.append(i)
print('Number of categorical features:', str(len(categorical_list)))
print('Number of numerical features:', str(len(numerical_list)))
Number of categorical features: 1
Number of numerical features: 12

Collinearity Measurement

Correlation is a statistical technique that can show whether and how strongly pairs of variables are related.

A perfect positive correlation means that the correlation coefficient is exactly one . This implies that as one variable moves, either up or down, the other security moves in lockstep, in the same direction. A perfect negative correlation means that two variable move in opposite directions, while a zero correlation implies no relationship at all.

Collinearity implies two variables are near perfect linear combinations of one another. Multicollinearity involves more than two variables. In the presence of multicollinearity, regression estimates are unstable and have high standard errors.

In [31]:
data=file[['CA%','NPS Score','Resolution','Rep Sat Score','Talk Time','AHT','TSR','FCR %']]
# data['Customer Type'] = data['primary rating'].apply(lambda x: "Detractor" if x>=0 and x<=6 else ("Passive" if x>=7 and x<=8 else "Promoter"))

Collinearity in terms of Categorical Variable

In [32]:
import pandas_profiling as pp
pp.ProfileReport(data)



Out[32]:

Collinearity in terms of Numerical Variables

In [33]:
data.corr()
Out[33]:
CA% NPS Score Resolution Rep Sat Score Talk Time AHT TSR FCR %
CA% 1.000000 0.073571 0.155354 0.070026 0.533714 0.119967 0.127139 0.231693
NPS Score 0.073571 1.000000 0.489419 0.506638 0.083310 0.275302 0.107182 0.151586
Resolution 0.155354 0.489419 1.000000 0.594962 -0.014194 0.129983 0.127245 0.236979
Rep Sat Score 0.070026 0.506638 0.594962 1.000000 -0.046330 0.248045 0.166103 0.326610
Talk Time 0.533714 0.083310 -0.014194 -0.046330 1.000000 -0.080923 -0.001574 0.113320
AHT 0.119967 0.275302 0.129983 0.248045 -0.080923 1.000000 0.370425 0.428767
TSR 0.127139 0.107182 0.127245 0.166103 -0.001574 0.370425 1.000000 0.253614
FCR % 0.231693 0.151586 0.236979 0.326610 0.113320 0.428767 0.253614 1.000000
In [ ]:
 
In [ ]:
 

Driver Weights

The key output from driver analysis is a measure of the relative importance of each of the predictor variables in predicting the outcome variable. These importance scores are also known as importance weights.

In [43]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.preprocessing import LabelEncoder
In [44]:
data1=file
my_df = data1.apply(LabelEncoder().fit_transform)
my_df.head()
Out[44]:
CSPID AVAYAID BPERNR/VOCID Survey Count Employee CA% NPS Score Resolution Rep Sat Score Talk Time AHT TSR FCR %
0 75 0 121 14 83 115 40 31 35 202 169 89 174
1 119 1 86 10 16 76 42 31 43 159 147 0 171
2 163 2 117 3 112 0 2 7 4 18 93 0 76
3 161 3 118 1 130 66 71 57 51 17 182 0 113
4 78 4 120 4 123 151 35 57 24 76 144 0 66
In [45]:
Colnames = ['CA%','Resolution','Rep Sat Score','Talk Time','AHT','TSR','FCR %']
In [46]:
X = my_df[['CA%','Resolution','Rep Sat Score','Talk Time','AHT','TSR','FCR %']]      #independent columns
y = my_df[['NPS Score']]                                                             #target column i.e price range

#apply SelectKBest class to extract top 5 best features

bestfeatures = SelectKBest(score_func=chi2, k=5)
fit = bestfeatures.fit(X,y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']  #naming the dataframe columns
print(featureScores.nlargest(5,'Score'))  #print 5 best features
       Specs        Score
3  Talk Time  5432.095779
4        AHT  3526.509832
0        CA%  3447.256616
5        TSR  3061.093903
6      FCR %  3043.640190
In [58]:
# from sklearn.ensemble import ExtraTreesClassifier
# import matplotlib.pyplot as plt
# model = ExtraTreesClassifier()
# model.fit(X,y)
# print(model.feature_importances_) #use inbuilt class feature_importances of tree based classifiers
# #plot graph of feature importances for better visualization
# feat_importances = pd.Series(model.feature_importances_, index=X.columns)
# feat_importances.nlargest(5).plot(kind='barh')
# plt.show()

Performance

Performance reflects the average rating of variables attained by Agents on the basis of there Work.

In the order

1)Talk Time

2)AHT

3)CA %

4)TSR

5)FCR %

In [18]:
##Colnames = ['Product Range','Staff Friendliness','Trial Room','Billing Experience','Ambience and Environment']
In [48]:
df = pd.read_excel("C:/Users/abhinav.gaharwar/Downloads/key driver analysis/data.xlsx")
In [51]:
x_time=my_df['Talk Time']
X_time=[float(x) for x in x_time]
Xtime_mean=np.mean(X_time)
print(Xtime_mean)

x_aht=my_df['AHT']
X_aht=[float(x) for x in x_aht]
Xaht_mean=np.mean(X_aht)
print(Xaht_mean)

x_ca=my_df['CA%']
X_ca=[float(x) for x in x_ca]
Xca_mean=np.mean(X_ca)
print(Xca_mean)

x_tsr=my_df['TSR']
X_tsr=[float(x) for x in x_tsr]
Xtsr_mean=np.mean(X_tsr)
print(Xtsr_mean)

x_fcr=my_df['FCR %']
X_fcr=[float(x) for x in x_fcr]
Xfcr_mean=np.mean(X_fcr)
print(Xfcr_mean)
118.04583333333333
118.59583333333333
117.73333333333333
56.80833333333333
114.3375

Plotting

Driver Weight vs Performance

In [52]:
data = {'Impact on NPS':[5432.1,3526.5,3447.25,3061.1,3043.6], 'Attributes':['Talk Time','AHT','CA %','TSR','FCR'],'Performance':[118,118.6,117.7,56.8,114.3]}
df = pd.DataFrame(data) 
df
Out[52]:
Impact on NPS Attributes Performance
0 5432.10 Talk Time 118.0
1 3526.50 AHT 118.6
2 3447.25 CA % 117.7
3 3061.10 TSR 56.8
4 3043.60 FCR 114.3
In [60]:
import plotly.express as px
iris = px.data.iris()
fig = px.scatter(df, x="Performance", y="Impact on NPS", color="Attributes",
                 size='Impact on NPS', hover_data=['Performance'])
fig.show()

x-axis represent performance, y-axis represent Impact on NPS

AHT,CA,FCR is performing well but it has a very low impact on NPS , has less weight attached to it. TALK TIME is the most important factor in the variables as its Impact on NPS and Performance both have a considerable influence .

Improving TALK TIME can improve the NPS by a significant amount.

In [ ]:
 
In [ ]: